o 

•i-H 

a* 



> 



£ 



EXPANDING THE TRANSFER ENTROPY TO IDENTIFY INFORMATION 

CIRCUITS IN COMPLEX SYSTEMS 

S. Stramaglia 1 ' 2 , Guo-Rong Wu 3 ' 4 , M. Pellicoro 1 ' 2 , and D. Marinazzo 3 
1 Istituto Nazionale di Fisica Nucleare, Sezione di Bari, Italy 

2 Dipartimento di Fisica, University of Bari, Italy 

3 Faculty of Psychology and Educational Sciences, 
Department of Data Analysis, Ghent University, 

Henri Dunantlaan 1, B-9000 Ghent, Belgium 



4 Key Laboratory for N euro Information of Ministry of Education, 
{-*) ■ School of Life Science and Technology, 

^vj ' University of Electronic Science and Technology of China, 

Chengdu 610054, China. 

^ ; (Dated: February 26, 2013) 

I | We propose a formal expansion of the transfer entropy to put in evidence irreducible sets of 

variables which provide information for the future state of each assigned target. Multiplets charac- 
terized by a large contribution to the expansion are associated to informational circuits present in 
the system, with an informational character which can be associated to the sign of the contribution. 
For the sake of computational complexity, we adopt the assumption of Gaussianity and use the 
corresponding exact formula for the conditional mutual information. We report the application of 
the proposed methodology on two EEG data sets. 
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INTRODUCTION 



The inference of couplings between dynamical subsystems, from data, is a topic of general interest. Transfer entropy 
[l( , which is related to the concept of Granger causality [2j , has been proposed to distinguish effectively driving and 
responding elements and to detect asymmetry in the interaction of subsystems. By appropriate conditioning of 
^^ | transition probabilities this quantity has been shown to be superior to the standard time delayed mutual information, 
which fails to distinguish information that is actually exchanged from shared information due to common history and 
input signals [3, Q. On the other hand, Granger formalized the notion that, if the prediction of one time series could 
be improved by incorporating the knowledge of past values of a second one, then the latter is said to have a causal 
influence on the former. Initially developed for econometric applications, Granger causality has gained popularity 
also in neuroscience (see, e.g., |5|-[9[). A discussion about the practical estimation of information theoretic indexes for 
signals of limited length can be found in [10j . Transfer entropy and Granger causality are equivalent in the case of 
Gaussian stochastic variables PJJ: they measure the information flow between variables [12l |. Recently it has been 
shown that the presence of redundant variables influences the estimate of the information flow from data, and that 
maximization of the total causality is connected to the detection of groups of redundant variables [13| . 

In recent years, information theoretic treatment of groups of correlated degrees of freedom have been used to 
reveal their functional roles as memory structures or those capable of processing informatio n 1141 . Information theory 
suggests quantities that reveal if a group of variables is mutually redundant or synergetic [15|, [l6j . Most approaches for 
the identification of functional relations among nodes of a complex networks rely on the statistics of motifs, subgraphs 
of k nodes that appear more abundantly than expected in randomized networks with the same number of nodes and 
degree of connectivity [17|, [l8| . 

An interesting approach to identify functional subgraphs in complex networks, relying on an exact expansion of 
the mutual information with a group of variables, has been presented in pj|. In this work we generalize these results 
to show a formal expansion of the transfer entropy which puts in evidence irreducible sets of variables which provide 
information for the future state of the target. Multiplets of variables characterized by an high value, unjustifiable by 
chance, will be associated to informational circuits present in the system. Additionally, in applications where linear 
models are sufficient to explain the phenomenology, we propose to use the exact formula for the conditioned mutual 
information among Gaussian variables so as to get a computationally efficient approach. An approximate procedure 
is also developed, to find informational circuits of variables starting from few variables of the multiplet by means of 
a greedy search. We illustrate the application of the proposed expansion to a toy model and two real EEG data sets. 



The paper is organized as follows. In the next section we describe the expansion and motivate our approach. In 
section III we report the applications of the approach and describe our greedy search algorithm. In section IV we 
draw our conclusions. 

II. EXPANSION 

We start describing the work in pj|. Given a stochastic variable X and a family of stochastic variables {Yk} k=1 , 
the following expansion for the mutual information, analogous to a Taylor series, has been derived there: 

S(X\{Y})-S(X) = -I(X;{Y}) = 

v ASjX) >p A 2 S(X) A n S(X) (I) 

l^i AYi ' ^i>j AYiAYj ' " ' ' "" AY v --AY n ' 



where the variational operators are defined as 

AS(X) 



AYi 
A 2 S(X) _ AI{X-Yi) 



AYiAYj AYj 



S(X\Y l )-S(X) = -I(X-Y l ), (2) 

IiX^-IiX-YilYj), (3) 



A ' )S(X) ~ I (X;Y z \Y k ) - I (X^Yj^Yk) - I (X;^) + I (X'^Yj), (4) 



AYiAYAY k 



and so on. 

Now, let us consider n + 1 time series {x a (t)} a= o^.^ n . The lagged state vectors are denoted 

Y a (t) = (x a (t-m),...,x a (t- 1)), 

m being the window length. 

Firstly we may use the expansion (pp) to model the statistical dependencies among the x variables at equal times. 
We take xo as the target time series, and the first terms of the expansion are 

W° = -I(x ; Xi ) (5) 

for the first order; 

Z^ = I(x ;xi) - I(x ;xi\xj) (6) 

for the second order; and so on. We note that 

Z/— = X [Xq] Xi\ Xj ) , 

where X (xq ; X{ ; Xj ) is the interaction information, a well known information measure for sets of three variables [20| ; 
it expresses the amount of information (redundancy or synergy) bound up in a set of variables, beyond that which 
is present in any subset of those variables. Unlike the mutual information, the interaction information can be either 
positive or negative. Common-cause structures lead to negative interaction information . As a typical example of 
positive interaction information one may consider the three variables of the following system: the output of an XOR 
gate with two independent random inputs (however some difficulties may arise in the interpretation of the interaction 
information, see [2l|). It follows that positive (negative) Zf, corresponds to redundancy (synergy) among the three 
variables #o, %i and Xj. 

In order to go beyond equal time correlations, here we propose to consider the flow of information from multiplets 
of variables to a given target. Accordingly, we consider 

S(x \{Y k }t =1 )-S(xo) = -I(x ;{Y k }t =1 ), (7) 

which measures to what extent all the remaining variables contribute to specifying the future state of x$. This quantity 
can be expanded according to (pQ): 

S(x \{Y k r k=1 )-S(x ) = 

v AS(xq) v A 2 S(x ) A n S(x ) (P) 

2-^i AYi ~ l ~ ^i>3 AYiAYj ^ " r AYi-AY n ' 



A drawback of the expansion (|7J) is that it does not remove shared information due to common history and input 
signals; therefore we choose to condition it on the past of xq, i.e. Yq. To this aim we introduce the conditioning 
operator Cy : 

C Yo S(X) = S(X\Y ), 

and observe that Cy and the variational operators (|2]) commute. It follows that we can condition the expansion (J8j) 
term by term, thus obtaining 

S(xo\{Y k r k=1 ,Yo)-S(xo\Y ) = -I(x ; {Y}% =1 \Y ) = 

v AS(xo\Y ) v A 2 S(x \Y ) A n S(x \Y ) (9) 

2-^i AYi ^ l^i>j AYiAYj ' " r AY; -AY™ " 

The first order terms in the expansion are given by: 

A o = AS(^ = _ I{xo . YilYoh (1Q) 

and coincide with the bivariate transfer entropies i — >• (times -1). The second order terms are 

s°. = i(z ;*TO-/(zo;^|y,-,io), (n) 

and may be seen as a generalization of the interaction information X; hence a positive (negative) B^- corresponds to 
a redundant (synergetic) flow of information {i, j} — ^ 0. The typical examples of synergy and redundancy, in the 
present framework of network analysis, are the same as in the static case, plus a delay for the flow of information 
towards the target. The third order terms are 

C% k = Hxo^ilYj^ + Iixo-YilY^Yo) 

-I(xo;Yi\Yo)-I(xo;Yi\Yj,Y k ,Yo), { ' 

and so on. 

The generic term in the expansion (J9|, 

= A*S(xo\Y ) 
k AYi---AY k ' V ; 

is symmetrical under permutations of the Y{ and, remarkably, statistical independence among any of the Y{ results 
in vanishing contribution to that order. Therefore each nonvanishing accounts for an irreducible set of variables 
providing information for the specification of the target: the search for for informational multiplets is thus equivalent 
to search for terms ([T3]) which are significantly different from zero. Another property of ([9]) is that the sign of each 
term is connected to the informational character of the corresponding set of variables, see |19j). 

For practical applications, a reliable estimate of conditional mutual information from data should be used. Non 
parametric methods are recommendable when nonlinear effects are relevant. However, a conspicuous amount of 
phenomenology in brain can be explained by linear models: therefore, for the sake of computational load, In this 



work we adopt the assumption of Gaussianity and use the exact expression that holds in this case [ll| , which reads 



as follows. Given multivariate Gaussian random variables X, W and Z, the conditioned mutual information is 

where | • | denotes the determinant, and the partial covariance matrix is defined 

T,(X\Z) = E(X) - E(X, Z)S(Z)- 1 S(X, Z) T , (15) 

in terms of the covariance matrix S(X) and the cross covariance matrix E(X, Z)\ the definition of E(X|VF © Z) is 
analogous. 

The statistical significance of ([T3]) can be assessed by observing that it is the sum of terms like ([T4]) which, under the 
null hypothesis I (X;W\Z) = 0, have a x 2 distribution. Alternatively, statistical testing may be done using surrogate 
data obtained by random temporal shuffling of the target vector xq; the latter strategy is the one we use in this work. 



III. APPLICATIONS 
A. Second order terms 

In this subsection we show the application of the proposed expansion, truncated at the second order. To this aim 
we turn to real electroencephalogram (EEG) data, the window length m being fixed by cross validation. Firstly we 
consider recordings obtained at rest from 10 healthy subjects. During the experiment, which lasted for 15 min, the 
subjects were instructed to relax and keep their eyes closed. To avoid drowsiness, every minute the subjects were 
asked to open their eyes for 5 s. EEG was measured with a standard 10-20 system consisting of 19 channels |29j . 
Data were analyzed using the linked mastoids reference, and are available from [30]. 

For each subject we consider several epochs of 4 seconds in which the subjects kept their eyes closed. For each 
epoch we compute the second order terms at equal times Zfj and the lagged ones B^; then we average the results 
over epochs. In order to visualize these results, for each target electrode we plot a on a topographic scalp map the 
pairs of electrodes which are redundant or synergetic with respect to it. Both quantities are distributed with a clear 
pattern across the scalp. Interactions at equal times are one order of magnitude higher than the lagged interactions, 
and are dominated by the effect of spatial proximity, see fig. 1. On the other hand, Bf, show a richer dynamics, 
such as interhemispheric communications and predominance redundancy to and from the occipital channels, see fig. 
2, reflecting the prominence of the occipital rhythms when the subjects rest with their eyes closed. 

As another example we consider intracranial EEG recordings from a patient with drug-resistant epilepsy and which 
has thus been implanted with an array of 8 x 8 cortical electrodes and two depth electrodes with six contacts. The 
data are available at [32] and described in [31]. For each seizure data are recorded from the preictal period, the 10 
seconds preceding the clinical onset of the seizure, and the ictal period, 10 seconds from the clinical onset of the 
seizure. We analyze data corresponding to eight seizures and average the corresponding results. 

For each electrode we compute the lagged influences B®a , obtaining for each electrode the pair of other electrodes with 
redundant or synergetic contribution to its future. The patient has a putative epileptic focus in a deep hippocampal 
region, with the seizure that then spreads to the cortical areas. In fig. 3 we report the values of coefficients B taking 
as the target a cortical electrode located on the putative cortical focus: we report the values of Bf, corresponding to 
all the couple of the electrodes, as well as their sum over electrode j. It is clear how the redundancy increases during 
the seizure. On the other hand, for sensors from 70 to 76, corresponding to a depth electrode, the redundancy is 
higher in the preictal period, reflecting the fact that the seizure is already active in its primary focus even if not yet 
clinically observable. The values of B corresponding to this electrode are reported in fig. 4. 

B. Greedy search of multiplets 

Given a target variable, the time required for the exhaustive search of all the subsets of variables, with a statistically 
significant information flow (fl3|h is exponential in the size of the system. It follows that the exact search for large 
multiplets is computationally unfeasible, hence we adopt the following approximate strategy. We start from a pair 
of variables with non-vanishing second order term B w.r.t. the given target. We consider these two variables as a 
seed, and aggregate other variables to them so as to construct a multiplet. The third variable of the subset is selected 
among the remaining ones as those that, jointly with the previously chosen variable, maximize the modulus \C\ of the 
corresponding third order term. Then, one keeps adding the rest of the variables by iterating this procedure. Calling 
Zk-i the selected set of k - 1 variables, the set Z^ is obtained adding, to Zfc_i, the variable, among the remaining 
ones, with the greatest modulus of Qk- These iterations stop when Q&, corresponding to Z&, is not significantly 
different from zero (the Bonferroni correction for multiple comparisons is to be applied at each iteration); Z^-i is 
then recognized as the multiplet originated by the initial pair of variables chosen as the seed. 

We apply this strategy to the following toy model 

x (t) = a rj(t - 1) + cr£ (£), 

x a (t) =b a rj(t)+(T 1 €a(t), a = l,...,ra (16) 

xp{t) =a 2 ^(t), /3 = ra + l,...,ra + M 

where £ and r] are i.i.d. unit variance Gaussian variables. In this model the target xq is influenced by the process 77; 
variables x a , a = 1, . . . , m, are a mixture of 77 and noise £, whilst the remaining M variables are pure noise. Estimates 
of Qk are based on time series, generated from (fT6|) and 1000 samples long. The results are displayed in figure (|5j). 
Firstly we consider the case m = 20 and M = 0, with all the twenty variables driving the target with equal couplings 
b a ; in figure ([5])- A we depict the term Q^ corresponding to the k-th iteration of the greedy search. We note that Cl^ 



has alternating sign and its modulus decreases with k. In figure (J5j)-B we consider another situation, with m = 10 
and M = 10, the ten non-zero couplings b a being non-uniform. Q^ still shows alternating sign, and Q^ vanishes for 
k > 9; hence the multiplet of ten variables is correctly identified. The order of selection is related to the strength of 
the coupling: variables with stronger coupling are selected first. 

In figure ((6]) we consider again the EEG data from healthy subjects with closed eyes [30|, and apply the greedy 
search taking C3 as the target and {C4, C6} as the seed. We find a subset of 9 variables influencing the target. The 
fact that the sign of Qj~ is alternating, as in the previous model, suggests that the channels in this set correspond 
to a single source which is responsible for the inter- hemispheric communication towards the target electrode C3. In 
figure ([7]) we take 01 as the target and {F3, C5} as the seed. A subset of 11 variables is found which describes the 
information flow from the frontal to the occipital cortex. 




FIG. 1: The instantaneous components Zfj for two target electrodes, C3 on the left and 01 on the right. The target electrode 
is in white, and for each of the other electrodes i on the map, the value of Zfj is displayed for the other electrodes. 




FIG. 2: The lagged components B®j for two target electrodes, C3 on the left and 01 on the right. The target electrode is in 
white, and for each of the other electrodes i on the map, the value of B®j is displayed for the other electrodes. 
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FIG. 3: The lagged second order terms B®j for a cortical electrode (in white) right before and during the clinical onset of a 
seizure, and the sum over the second electrode of the pair in the lower right panel. 
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FIG. 4: The lagged second order terms B®j for a depth electrode (in white) right before and during the clinical onset of a 
seizure, and the sum over the second electrode of the pair in the lower right panel. 




FIG. 5: Qk as a function of the multiplet size k for a model in which one variable is influenced by all the other variables or by 
part of them. In (A) m — 20 and M — 0: all the 20 variables influence the target with unitary weight. In (B) m — 10 and 
M = 10; the weights b a are [ 1.75 1.75 1 1 1 1 .5 .5 .5 .5 ]. The insets show the logarithm of the absolute value of H&. The first 
point k — 1, in both plots, represents the initial pair of variables chosen as the seed, i.e. {1,2}. The other parameters are, in 
both cases, a, a, ai, <72 = 0.5. 



IV. CONCLUSIONS 



Summarizing, we have proposed to describe the flow of information, in a system, by means of multiplets of vari- 
ables which send information to each assigned target node. We used a recently proposed expansion of the mutual 
information, between a stochastic variable and a set of other variables, to measure the character and the strength of 
multiplets of variables. Indeed, terms of the proposed expansion put in evidence irreducible sets of variables which 
provide information for the future state of the target channel. The sign of the contributions are related to their in- 
formational character; for the second order terms, synergy and redundancy correspond to negative and positive sign, 
respectively. For higher orders, we have shown that groups of variables, related to the same source of information, 
lead to contributions with alternating signs as the number of variables is increased. A decomposition with similarities 
to the present work have been reported in [33j, where for multiple sources the distinction between unique, redundant, 
and synergistic transfer has been proposed; in [34] the inference of an effective network structure, given a multivariate 
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FIG. 6: Informative contributions to the target electrode C3. Left: information contribute from the resulting multiplet when 
time series from a given electrode are added to the existing multiplet, starting from the pair (C4,C6) which is the one which 
shares the most of information on the future of the target time series. Channels P4, F4, F8, P6, 02, Pz and Cz are recognized 
to belong to the same multiplet as C4 and C6, whilst including Ol leads to a Q& which is not significantly different from zero. 
Right: the absolute value of this contributions plotted on a scalp map. 
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FIG. 7: Informative contributions to the target electrode Ol. Left: information contribute from the resulting multiplet when 
time series from a given electrode are added to the existing multiplet, starting from the pair (F3,C5) which is the one which 
shares the most of information on the future of the target time series. Nine channels (F7, Fz, C3, Cz, Fpl, F4, C4, C6, Fp2) 
are recognized to belong to the same multiplet, for the remaining variables Qk is not significantly different from zero. Right: 
the absolute value of this contributions plotted on a scalp map. 

time series, using incrementally conditioned transfer entropy measurements, has been discussed. The main purpose 
of this paper is to introduce an information based decomposition, and we did that in a framework unifying Granger 
causality and Transfer entropy, thus using a formula which is exact for linear models. In cases in which a nonlinear 
model is required, the entropy has to be computed, requiring a high enough number of time points for statistical 
validation; nonetheless the expansion that we proposed remains valid and exact in both cases. 

We have reported the results of the applications to two EEG examples. The first data set is from resting brains and 
we found signatures of inter-hemispherical communications and frontal to occipital flow of information. Concerning a 
data set from an epileptic subject, our analysis puts in evidence that the seizure is already active, close to the primary 
lesion, before it is clinically observable. 
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